배깅 기반의 부트스트래핑을 이용한 개체명 인식 학습 기법

정유진; 김주애; 고영중; 서정연; Yujin Jeong; Juae Kim; Youngjoong Ko; Jungyun Seo

연구문헌

국내 논문지

홈 > 연구문헌 > 국내 논문지 > 한국정보과학회 논문지 > 정보과학회논문지 (Journal of KIISE)

정보과학회논문지 (Journal of KIISE)

Current Result Document :

한글제목(Korean Title)	배깅 기반의 부트스트래핑을 이용한 개체명 인식 학습 기법
영문제목(English Title)	A Named-Entity Recognition Training Method Using Bagging-Based Bootstrapping
저자(Author)	정유진 김주애 고영중 서정연 Yujin Jeong Juae Kim Youngjoong Ko Jungyun Seo
원문수록처(Citation)	VOL 45 NO. 08 PP. 0825 ~ 0830 (2018. 08)
한글내용 (Korean Abstract)	기존 개체명 인식 연구는 지도학습에 기반한 개체명인식이 주를 이루고 있다. 지도학습에 기반한 개체명인식이 좋은 성능을 보이고 있지만, 대량의 정답 말뭉치를 구축하기 위해 많은 시간과 비용을 필요로 한다는 문제점이 있다. 본 논문에서는 이러한 문제를 해결하기 위해, 대량의 말뭉치에 수동으로 정답을 부여하기 위한 노력 없이, 개체명 인식 모델이 자동 생성한 정답을 학습에 사용하는 개체명 인식 모델 학습 기법을 제안한다. 제안 방법은 소량의 개체명 정답 말뭉치만으로 대량의 개체명 정답을 자동 생성하여 학습에 사용하므로, 대량의 정답 말뭉치를 생성하기 위해 필요한 시간과 비용을 크게 절감시킨다. 추가적으로 배깅 기법을 사용하여 자동 생성한 정답들 중 오류를 제거한다. 부트스트래핑 기법과 배깅 기법을 추가하였을때, F1 점수 최고 70.67%를 기록하였다. 비교를 위한 기본 CRF 개체명 인식 모델의 F1점수는 65.59%를 기록하였다.
영문내용 (English Abstract)	Most previous named-entity(NE recognition studies have been based on supervised learning methods. Although supervised learning-based NE recognition has performed well, it requires a lot of time and cost to construct a large labeled corpus. In this paper, we propose an NE recognition training method that uses an automatically generated labeled corpus to solve this problem. Since the proposed method uses a large machine-labeled corpus, it can greatly reduce the time and cost needed to generate a labeled corpus manually. In addition, a bagging-based bootstrapping technique is applied to our method in order to correct errors from the machine-labeled data. As a result, experimental results show that the proposed method achieves the highest F1 score of 70.76% by adding the bagging-based bootstrapping technique, which is 5.17%p higher than that of the baseline system.
키워드(Keyword)	개체명 인식 부트스트래핑 배깅 CRF 준지도학습 말뭉치 생성 Named-entity recognition bootstrapping bagging CRF semi-supervised learning corpus generation
파일첨부	PDF 다운로드